[MISC] Switch dedupe contact sort to use Quadrants bitonic sort by hughperkins · Pull Request #2853 · Genesis-Embodied-AI/genesis-world

hughperkins · 2026-05-27T21:31:19Z

Replaces the inlined 15-stage bitonic compare-exchange schedule in func_clamp_prune_and_sort_contacts_coop (phase 1a) with a one-line call to the new Quadrants subgroup primitive:

my_key, my_idx = qd.simt.subgroup.bitonic_sort_kv_tiled(my_key, my_idx, 5)

The primitive (added in quadrants hp/bitonic-sort-kv) is a @qd.func that inlines at compile time and unrolls the same 15 compare-exchange stages this code used to write inline, so the generated kernel IR is bit-identical to today on CUDA. Net change: ~30 lines of hand-rolled bitonic code removed, the sentinel load + write-back wrapper is unchanged, and the rest of the kernel (clamp + key init + bucket walk + phase 2 + phase 3) is untouched.

log2_size = 5 pins the sort to a 32-lane tile, matching the kernel's hard-coded block_dim = _K = 32. Using the tiled form rather than the bare bitonic_sort_kv(...) wrapper keeps the sort width fixed at 32 even on AMDGPU wave64, where the bare wrapper would otherwise sort across all 64 lanes and mix in garbage from the inactive upper half.

Requires the matching quadrants change to be installed (the public symbol qd.simt.subgroup.bitonic_sort_kv_tiled is added by that PR).

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

I read the CONTRIBUTING document.
I followed the Submitting Code Changes section of CONTRIBUTING document.
I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
I updated the documentation accordingly or no change is needed.
I tested my changes and added instructions on how to test it for reviewers.

I have added tests to cover my changes.
All new and existing tests passed.

…rt_kv_tiled Replaces the inlined 15-stage bitonic compare-exchange schedule in ``func_clamp_prune_and_sort_contacts_coop`` (phase 1a) with a one-line call to the new Quadrants subgroup primitive: my_key, my_idx = qd.simt.subgroup.bitonic_sort_kv_tiled(my_key, my_idx, 5) The primitive (added in quadrants hp/bitonic-sort-kv) is a @qd.func that inlines at compile time and unrolls the same 15 compare-exchange stages this code used to write inline, so the generated kernel IR is bit-identical to today on CUDA. Net change: ~30 lines of hand-rolled bitonic code removed, the sentinel load + write-back wrapper is unchanged, and the rest of the kernel (clamp + key init + bucket walk + phase 2 + phase 3) is untouched. ``log2_size = 5`` pins the sort to a 32-lane tile, matching the kernel's hard-coded ``block_dim = _K = 32``. Using the tiled form rather than the bare ``bitonic_sort_kv(...)`` wrapper keeps the sort width fixed at 32 even on AMDGPU wave64, where the bare wrapper would otherwise sort across all 64 lanes and mix in garbage from the inactive upper half. Requires the matching quadrants change to be installed (the public symbol ``qd.simt.subgroup.bitonic_sort_kv_tiled`` is added by that PR).

7 lines of prose -> 3. Same intent: explain why we use the tiled form with ``log2_size = 5`` rather than the bare ``bitonic_sort_kv`` wrapper.

Pair ``_K = qd.static(32)`` with ``_LOG2_K = qd.static(5)`` at the top of the kernel and pass ``_LOG2_K`` into ``bitonic_sort_kv_tiled``. The relationship between the sort width and ``_K`` is now visible at the binding site instead of being a magic 5 sitting next to ``_K = 32``.

With ``_K`` and ``_LOG2_K`` defined together at the top of the kernel and ``_LOG2_K`` flowing straight into ``bitonic_sort_kv_tiled``, the explainer ("pins the tiled sort to _K lanes on every backend ...") just restates what the names already convey.

``qd.static(32)`` is just the int ``32`` at compile time, so ``_K.bit_length() - 1`` evaluates to ``5`` and keeps _K and _LOG2_K in sync if _K is ever retuned.

``qd.static()`` is a no-op on Python int literals -- it evaluates its argument at compile time, and a plain ``32`` is already a Python compile-time int. Several other Genesis solver files wrap kernel-scope ``BLOCK_DIM`` / ``WARP_SIZE`` / ``_K`` constants this way as a defensive marker, but it doesn't change codegen and the bare ints read more directly.

… time Reverts the qd.static() removal that broke kernel compilation. Without qd.static, _K = 32 becomes a kernel-local Expr rather than a Python int, so _K.bit_length() is routed to quadrants.lang.matrix_ops by the AST transformer and fails with AttributeError. Wrapping in qd.static keeps _K a compile-time Python int, letting int.bit_length() evaluate to a folded literal.

github-actions · 2026-06-04T22:11:16Z

⚠️ Abnormal Benchmark Result Detected ➡️ Report

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78a6b9fed0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

hughperkins · 2026-06-05T14:07:12Z

🙌

hughperkins added 6 commits May 27, 2026 12:25

collider: shorten the bitonic-sort comment

ddf6809

7 lines of prose -> 3. Same intent: explain why we use the tiled form with ``log2_size = 5`` rather than the bare ``bitonic_sort_kv`` wrapper.

collider: derive _LOG2_K from _K instead of hardcoding 5

47bbe19

``qd.static(32)`` is just the int ``32`` at compile time, so ``_K.bit_length() - 1`` evaluates to ``5`` and keeps _K and _LOG2_K in sync if _K is ever retuned.

duburcqa reviewed Jun 1, 2026

View reviewed changes

Comment thread genesis/engine/solvers/rigid/collider/contact.py

duburcqa previously approved these changes Jun 1, 2026

View reviewed changes

hughperkins added 2 commits June 4, 2026 08:11

Merge remote-tracking branch 'origin/main' into hp/use-bitonic-sort-kv

8c07996

1.0.2

0af18cc

hughperkins dismissed duburcqa’s stale review via 0af18cc June 4, 2026 12:14

hugh and others added 2 commits June 4, 2026 14:30

Merge branch 'main' into hp/use-bitonic-sort-kv

78a6b9f

hughperkins marked this pull request as ready for review June 4, 2026 22:16

hughperkins requested a review from YilingQiao as a code owner June 4, 2026 22:16

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread genesis/engine/solvers/rigid/collider/contact.py

duburcqa merged commit 2f219c4 into Genesis-Embodied-AI:main Jun 5, 2026
21 of 23 checks passed

hughperkins deleted the hp/use-bitonic-sort-kv branch June 5, 2026 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MISC] Switch dedupe contact sort to use Quadrants bitonic sort#2853

[MISC] Switch dedupe contact sort to use Quadrants bitonic sort#2853
duburcqa merged 10 commits into
Genesis-Embodied-AI:mainfrom
hughperkins:hp/use-bitonic-sort-kv

hughperkins commented May 27, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

hughperkins commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hughperkins commented May 27, 2026

Description

Related Issue

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

hughperkins commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants